<module 'helper' from '/Users/paulius/Docs/Projects/Turing/m1s4_2/helper.py'>
/Users/paulius/Docs/Projects/Turing/m1s4/venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3508: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead. /Users/paulius/Docs/Projects/Turing/m1s4_2/process_data.py:142: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries. Use `to_crs()` to reproject one of the input geometries to match the CRS of the other. Left CRS: None Right CRS: EPSG:4269
threat_type shoot 2421 threat 2331 point 1621 attack 1288 move 430 undetermined 303 flee 160 accident 49 Name: count, dtype: int64
8.550490840064233
Introduction¶
This is an analysis
Will be performed by:
- looking at the dataset independently
- cross-referencing the police killing dataset with demographic data on the state and the CBSA (core based statistical areas, a collection of interlinked counties such as a metropolitan area) to determine {THE}
The goal of this analysis is try to establish what are the statistical factors which {INFLUENCE} the number of police shooting and {WHTEHRT}
1.2.1 Chronological Distribution of Police Shooting¶
In total there were 8635 recorded police shotings between 2015-01-02 and 2023-07-22, averaging about 1010 per year
We can see the number of police shootings has been relatively stable (even though there was significant month to month variance) prior to 2020 with about 80 killings per month. In recent years it has increased to about 90
1.2.2 Race Data¶
threat_type category flee_status category armed_with category age float64 gender category race category was_mental_illness_related bool body_camera bool age_bracket_short category INC110213 int64 dtype: object
1.2.4 Armed With and Threat Type¶
1.2.7 Age Analysis¶
1.2.9 Body Camera¶
Intuitively we could expect that increasing usgae of body cameras would've resultedin a decrease of police shootings, however this has not been the case.There are several possible explanations for that:
- Body cameras are being rolled out at a too slow pace. Any effect they might have hadhas been overshadowed by increasing police killings (possible related to the covid pandemic
- There is no meaningful relationship because camera usage and shootings or it's very low
We can't measure this relationship statistically without additional data such as a dataset of all police encounters and their outcomes which is obviously unobtainable.There possibly might be other approaches that could be used to estimate the effect bodycameras have which might be worth investigating
1.2.10 Correlations¶
Variables pairs with Spearman corr. bellow -0.4 or above 0.4:
| var I | var II | coef | |
|---|---|---|---|
| 0 | armed_with_gun | threat_type_attack | -0.43 |
| 1 | armed_with_gun | armed_with_knife | -0.53 |
| 2 | flee_status_foot | flee_status_not | -0.42 |
| 3 | gender_female | gender_male | -0.94 |
| 4 | age | age_bracket_short_45+ | 0.75 |
| 5 | armed_with_gun | threat_type_shoot | 0.51 |
| 6 | armed_with_undetermined | threat_type_undetermined | 0.49 |
| 7 | race_Black | race_White | -0.47 |
| 8 | flee_status_car | flee_status_not | -0.47 |
| 9 | age | age_bracket_short_25 or younger | -0.68 |
1.3 Analysis¶
After examining the dataset and performing some basic hypothesis testing we've found that there are some significant differences between the characteristics of victims depending on their race and age:
- Black people are more likely to be killed when after shooting or actively attacking someone than other group
- Hispanic and other people are more likely to be killed when not armed with a firearm than black or white people
- Killings of white individuals are more likely to be related to mental health issues
- Killings of black people are more likely to have no determined reason than for other groups
- White people are significantly less likely to be killed when the officer is wearing a body camera
- Non-white people are significantly more likely to be killed when unarmed
- People who are 45 or older are more likely to be killed while pointing a firearm
However we can't explain whether these relationships exist due to some underlying reason (e.g. systemic discrimination or biases of the law enforcement agents, socioeconomic differences between racial groups etc.) without additional data. This is something that needs to be investigated further.
However even if we were able to provide a more reliable explanation for these relationships that does not mean that we will be able to derive actionable decisions for the United States Department of Justice. Solving them might require an enacment of complex socioecnomic policies which is not something the state department is in control.
Actionable Decisions¶
One other important aspect that we must take into account is that while all police shootings are regretable the majority of them are justifiable in the sense that the victim was shot while commiting a violent crime and threatening the life and safery of other individuals and/or police officers.
While it's possibly that the prior training etc. of police officers to handle such situations using less lethal methods can possibly decrease the number of deaths this is not something wen can analyze using the data we have.
Instead we'll focus on demographic, social, economic and other macro factors which can be used to explain the varying levels of police shootings between differents states to: 1. Determine the factors which explain the variance in police shootings. 2. Find factors which can be influenced by Federal and local governments.
This might allow police deparments in different states to adopt policies, training standards etc. from other jurisdictions which is potentiall a relatively straighforward way to decrease the incidence of police shootings.
1.3.1 Explaining Differences Between States:¶
One possible approach could be to try and find demographically similar US states which have signficantly different numbers of shootings per capita. If such states exist we can try to find whether this can be explained by some other variable or attribute which could be theoretically influenced by local or state governments.
1.3.1.1 Homocide Rates¶
We would expect the the number of police shootings would be more or less proportional to the levels of violent crime in any given state:
R² = 0.074
p-value = 0.056
The relationship is not statistically significant.
Interestingly we can see that the correlation between homocides and police shootings is very low and only a small proportion of variability in police shotings between different states can be explained:
1.3.1.2 Police Spending Per Capita¶
Next let look at the spending on law enforcment per capita (adjusted by per capita income in state):
R² = 0.101
p-value = 0.022
The relationship is statistically significant.
While relationship between high spending on law enforcement and thge number of police shotings is relatively low, suprisingly it's positively correlated and statistically significant. The more a state spends on police the more people end up being shot. We shouldn't just took conclusions based on this alone, though. It's possible that there are other factors at play:
- Threre is more crime in poorer states requiring more resources for law enforcement (however we have already partially disproven this by looking at the homocide rate)
type of homocides (i.e. high levels of drug or organized crime related crime probably require more resources to police than high levels of domestic homocides
- Other sociodemographic variables which are possibly correlated with police spending (e.g. population density) offer a stronger explanation
- allocation of spending (i.e. in some states police officers might be expected to provide services which might be provided by other organization in other states
etc.
1.3.1.1 Clustering States by Demographics¶
Hierarchical clustering is a method of cluster analysis that builds a hierarchy ofclusters by minimizing the variance of thedistances between the clusters being merged.
The states that end up on the same branch are most similar to each other based on these factors:
- Persons 65 years and over, percent
- White alone, percent
- Black or African American alone, percent
- Hispanic or Latino, percent
- Foreign born persons, percent
- Language other than English spoken at home, pct age 5+
- High school graduate or higher, percent of persons age 25+
- Bachelor's degree or higher, percent of persons age 25+
- Homeownership rate
- Housing units in multi-unit structures, percent
- Median household income
- Persons below poverty level, percent
- Population per square mile, 2010
- police_prop_income
- Homocide per 1000k
Total Clusters: 9
1.3 Analysis¶
Let's build a statistical model to try and determine which of the demographic and other variables are best at explaining the variance in police shootings between different states.
We can't use Random Forest due to the low number of observations which would likely result in overfitting.
Multiple linear regression is also possibly not the best option due to the higher number of dependent variables in relation to the number of observations.
Let's look at the correlation between dependent variables before we chose a model:
1.3.1 Correlation and Preparing the Dataset¶
1.3.2 Elastic Net Linear Regression Model¶
Considering that there is strong correlation between many of the variables we'll use the Elastic Net model instead of Lasso Regression for instance
Persons 65 years and over, percent -0.131234 White alone, percent 0.000000 Foreign born persons, percent -0.000000 High school graduate or higher, percent of persons age 25+ 0.000000 Bachelor's degree or higher, percent of persons age 25+ -0.000000 Homeownership rate -0.220462 Housing units in multi-unit structures, percent -0.739171 Median household income 0.000000 Persons below poverty level, percent 0.064068 Population per square mile, 2010 -0.422243 police_prop_income 0.466692 Homocide per 1000k 0.000000 home_price_to_income 0.077081 dtype: float64
Persons 65 years and over, percent -0.131234 White alone, percent 0.000000 Foreign born persons, percent -0.000000 High school graduate or higher, percent of persons age 25+ 0.000000 Bachelor's degree or higher, percent of persons age 25+ -0.000000 Homeownership rate -0.220462 Housing units in multi-unit structures, percent -0.739171 Median household income 0.000000 Persons below poverty level, percent 0.064068 Population per square mile, 2010 -0.422243 police_prop_income 0.466692 Homocide per 1000k 0.000000 home_price_to_income 0.077081 dtype: float64
1.3.3 Interpreting Model Results¶
Any conclusions we make based on these results are obviously should be taken with a grain of salt however they do show some possibly suprising finding:
Racial diversity/proportion of non-white population has no influence on the number of shootings per capita.
However population density and conentration seem to be important factors. Specifically the proportion of people living in multi-unit housing units (apartments) seems to be the strongest predictor. There are likely several non straigforward interpretations of this however in combination with population density this might imply that:
- police officers tend to behave different depending how likely other people and bystanders in general are to witness their actions.
- Also it's possible that they feel less safe in lense densely populated areas because it might take longer for other officers to reach them.
- People shot by police are more likely to die if it occurs in areas with poor coverage by emergency services and it takes a long time for them to arrive.
We can't test the validity of any of these hypothesis but it might be worth examining them further because they all seem to be highly actionable (improving police training, strategies for acting around dangerous individuals like waiting for backup etc.)
Homocide rate seems to have no effect on the number of police shooting while the amount spent per capita on policing in the state is a relatively strong predictor.
this implies that there is not link between the general level of extreme violence in the state and the number of police shootings. This is highly concerned since using deadly force is only justifiable when the life of the officer or somebody else is in danger. However there seems to be no relationship between actual likelyhood of a life threatening event occuring the decision by a law enforcement officer to use deadly force.
This is something that certainly should be investigated further and is also possibly highly actionable. Especially because certain states handle this much better (like New York) and their practices might be applied in states which handle it much worse like New Mexico.
High police spending seems to have a moderate effect on the incidence of police shooting combined with the homocide statistics this is also highly concerning. Increased spending on police, in this case at least, seems to produce a more negative outcome. It's hard to determine why this might be the case. However it's possible that signicant proportions of funding might be missaolcated (e.g. spent on unncesary equipment etc.) and might better used to improve training. Even barring that it might mean that a smaller police presence might decrease the number of police shooting while have no effect on the murder rate (it's important to note that other crime statistics are not taken into account here).
State Comparisons¶
| Persons 65 years and over, percent | White alone, percent | Black or African American alone, percent | Hispanic or Latino, percent | Foreign born persons, percent | Language other than English spoken at home, pct age 5+ | High school graduate or higher, percent of persons age 25+ | Bachelor's degree or higher, percent of persons age 25+ | Homeownership rate | Housing units in multi-unit structures, percent | Median household income | Persons below poverty level, percent | Population per square mile, 2010 | police_prop_income | Homocide per 1000k | Police | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| state | ||||||||||||||||
| Country | -0.145377 | -0.124802 | 0.140295 | 0.588084 | 0.659606 | 0.649198 | -0.493532 | 0.033700 | -0.274418 | 0.218938 | -0.054630 | 0.197834 | -0.213494 | 0.909540 | NaN | 0.562399 |
| AL | 0.302599 | -0.716872 | 1.391715 | -0.738461 | -0.905517 | -0.967779 | -1.402259 | -1.031485 | 0.604848 | -0.835983 | -1.181015 | 1.220178 | -0.208363 | -0.378306 | 1.704902 | -0.683565 |
| AK | -3.001222 | -0.932171 | -0.721795 | -0.469163 | -0.322759 | 0.179753 | 1.261250 | -0.189645 | -0.475917 | 0.010043 | 1.982825 | -1.559322 | -0.276672 | 2.446988 | -0.204900 | 2.256154 |
| AZ | 0.638581 | 0.359619 | -0.647637 | 1.894681 | 0.742857 | 1.285557 | -0.587539 | -0.292728 | -0.366009 | -0.334634 | -0.430974 | 0.996541 | -0.236288 | 0.578509 | 0.164674 | 0.024644 |
| AR | 0.526587 | 0.052050 | 0.362769 | -0.449215 | -0.739015 | -0.759137 | -1.214247 | -1.460995 | 0.055306 | -0.856873 | -1.466839 | 1.411868 | -0.236508 | -0.435314 | 0.805603 | -0.856105 |
Alaska vs Utah¶
| Persons 65 years and over, percent | White alone, percent | Black or African American alone, percent | Hispanic or Latino, percent | Foreign born persons, percent | Language other than English spoken at home, pct age 5+ | High school graduate or higher, percent of persons age 25+ | Bachelor's degree or higher, percent of persons age 25+ | Homeownership rate | Housing units in multi-unit structures, percent | Median household income | Persons below poverty level, percent | Population per square mile, 2010 | police_prop_income | Homocide per 1000k | Police | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| state | ||||||||||||||||
| Country | -0.145377 | -0.124802 | 0.140295 | 0.588084 | 0.659606 | 0.649198 | -0.493532 | 0.033700 | -0.274418 | 0.218938 | -0.054630 | 0.197834 | -0.213494 | 0.909540 | NaN | 0.562399 |
| AK | -3.001222 | -0.932171 | -0.721795 | -0.469163 | -0.322759 | 0.179753 | 1.261250 | -0.189645 | -0.475917 | 0.010043 | 1.982825 | -1.559322 | -0.276672 | 2.446988 | -0.204900 | 2.256154 |
| UT | -2.665240 | 0.951689 | -0.962810 | 0.199097 | -0.122956 | -0.018457 | 1.041902 | 0.291406 | 0.678120 | -0.261521 | 0.609608 | -0.664770 | -0.252925 | -0.525750 | -0.979561 | -0.744692 |
| Persons 65 years and over, percent | White alone, percent | Black or African American alone, percent | Hispanic or Latino, percent | Foreign born persons, percent | Language other than English spoken at home, pct age 5+ | High school graduate or higher, percent of persons age 25+ | Bachelor's degree or higher, percent of persons age 25+ | Homeownership rate | Housing units in multi-unit structures, percent | Median household income | Persons below poverty level, percent | Population per square mile, 2010 | police_prop_income | Homocide per 1000k | Police | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| state | ||||||||||||||||
| Country | 14.5 | 77.4 | 13.2 | 17.4 | 12.9 | 20.7 | 86.0 | 28.8 | 64.9 | 26.0 | 53046 | 15.4 | 87.4 | 0.014439 | NaN | 406.51677 |
| AK | 9.4 | 66.9 | 3.9 | 6.8 | 7.0 | 16.2 | 91.6 | 27.5 | 63.8 | 24.0 | 70760 | 9.9 | 1.2 | 0.018469 | 66.509938 | 603.02121 |
| UT | 10.0 | 91.4 | 1.3 | 13.5 | 8.2 | 14.3 | 90.9 | 30.3 | 70.1 | 21.4 | 58821 | 12.7 | 33.6 | 0.010676 | 30.921859 | 254.87190 |
One thing we could try to do is looking at the areas in which police forceshave adopted body cameras and check whether the concinded with a decrease in shootingsin these areas.
Limitations¶
The core of the analysis is based on analyzing US states. This mean that the number of samples is quite and low and might be to low to for some mode. It might be worth going down a level or so and using Combined statistical area instead (collections of countries based on interconnected, ussually urban areas).
It would be a good idea to look at more variables like the number of police interactions and the liklyhood of them ending in a police shooting based on the target socio-economic status, race, mental state etc. and other factors like whether the officer is wearing a body camera, their training level etc. Of course such datasets are probably unobtanable without significant resources.